Artificial Intelligence in the Life Sciences — Latest Matching Preprints

1

Desktop-Scale Hit-Point Discovery for Intrinsically Disordered α-Synuclein Using State-Space Compression and a Discrete Phase-Interference Search Operator

Kim, D. H.; Khenmedekh, G.-O.; Park, i.; Kim, S.

2026-06-28 bioinformatics 10.64898/2026.06.22.733879 medRxiv

Top 0.1%

2.4%

Show abstract

The accessible chemical space dwarfs any tractable screening budget, and most artificial intelligence drug discovery pipelines respond by docking and ranking a small sublibrary. The resulting hit list is agnostic to selectivity, brain penetration, toxicity, synthetic accessibility, and chemical novelty. We present ISTP-DPISO DrugEngine, an end-to-end engine developed by ISTP Tech that integrates the Local Information Criticality Principle (LICP) with a Discrete Phase-Interference Search Operator (DPISO). We demonstrate the engine on the intrinsically disordered protein (IDP) -synuclein, whose non-amyloid-component (NAC, residues 61-95) drives Parkinson-associated aggregation. The resulting LICP active set focuses the expensive LICP-DPISO scoring: in a production-scale run, the engine compressed a ~8.46x108-molecule mirror to a 10,000,000-molecule active set (~85-fold) before scoring, then converged to a compact, safety-gated shortlist plus de novo designs. The entire campaign ran on a single desktop workstation, without any high-performance-computing cluster. Three engine-prioritized, commercially available candidates (2-D08, Uralenol, Herbacetin) and an (-)-epigallocatechin gallate (EGCG) positive control were then tested in a thioflavin-T (ThT) aggregation assay at 100 {micro}M: all three engine-nominated candidates suppressed -synuclein aggregation, giving perfect prospective inhibitor-call concordance (3/3 nominated); together with the EGCG positive control, all four assayed compounds inhibited aggregation (4/4 total), two by [≤]80% plateau reduction. ISTP-DPISO DrugEngine reframes virtual screening from post-hoc score fusion to a single, state-space-compressed, safety-gated, experimentally validated discovery pipeline.

2

A control-validated pan-proteome deep-learning pipeline nominates GPR35 as a candidate target of the orphan bacterial metabolite ligiamycin A

Martin, J.

2026-07-06 bioinformatics 10.64898/2026.07.01.735807 medRxiv

Top 0.1%

1.8%

Show abstract

Most microbial natural products with documented bioactivity lack an identified molecular target, which limits their development. We present an open, control-validated computational pipeline for natural-product target hypothesis generation. It combines a pan-proteome deep-learning drug-target interaction (DTI) model (a graph neural-network ligand encoder, an ESM-2 protein language-model encoder, and bidirectional cross-attention) with bias-corrected ranking and control-anchored molecular docking. Applying it to ligiamycin A, a 2022-described Streptomyces/Achromobacter co-culture decalin-amino-maleimide with no reported target, we find that the predicted interactions of the compound are dominated by class-A G-protein-coupled receptors. Using a drug with a known target (losartan) we identify and correct a frequent-hitter bias in the raw model; after correction the standout candidates are uniformly class-A GPCRs, led by the orphan receptor GPR35. Structure-based docking with matched positive and negative controls across three candidates corroborates GPR35 specifically: ligiamycin A scores comparably to the known GPR35 agonist zaprinast at the agonist pocket (-8.1 vs -8.3 kcal/mol; non-binder floor -5.5), whereas FFAR1 is excluded and histamine H2 is inconclusive. We propose GPR35 as a prioritized, experimentally testable target and release the workflow as a reusable tool. The result is a computational hypothesis that requires experimental validation.

3

Practical Use of Advanced AI Frameworks on Real-Life Scientific Problems: Three Case Studies

Gulluoglu, H. S. A.; Baby, J.; Bagul, K. M.; Basangari, B. R.; Bathini, S. A.; Chalamalla, N. K. R.; Dcunha, J.; Gupta, O.; Huang, L.; Jiang, X.; Naidu, Y. R.; Sathishkumar, G.; Sehrawat, M.; Thota, S. L.; Thuvara, D.; Vanguri, M. B.; Yin, J.; Jugder, B.-E.; Lusky, I. E.; Li, J.; Sinitskiy, A.

2026-06-29 bioinformatics 10.64898/2026.06.23.734132 medRxiv

Top 0.1%

1.7%

Show abstract

Agentic artificial intelligence (AI) systems increasingly claim to automate scientific research, yet independent evaluations report persistent gaps between those claims and demonstrated capability. We tested frontier agentic AI systems on three practical problems: prediction of treatment non-response in immune-mediated inflammatory diseases, optical chemical structure recognition for literature mining, and prediction of drug-design-related properties from small datasets. Each problem was first assigned to autonomous frameworks and then reattempted as human-led, AI-assisted work. Autonomous runs failed in most cases, while human-led work produced reusable resources and modest but defensible performance, including new evidence for possible mechanisms of treatment resistance and a more practical benchmark for mining chemical structures from scientific papers. Property prediction was the single task on which one autonomous AI framework matched the human expert. We conclude that current frameworks can carry out engineering and analysis once a human expert leads the project, but cannot yet engineer a novel solution without oversight. The use of AI on real-life scientific problems remains an art rather than a routine technology.

4

Quantum Encoding Strategies for Drug Response Prediction: An Exhaustive Benchmark on a 20-Qubit Superconducting QPU

Derouich, R.; Mathlouthi, N. E. H.

2026-07-13 bioinformatics 10.64898/2026.07.08.737310 medRxiv

Top 0.1%

1.2%

Show abstract

We present the first systematic, hardware-executed benchmark of twelve distinct quantum data-encoding strategies for drug-response prediction on a real superconducting quantum processing unit (QPU). All experiments were conducted on the IQM Garnet 20-qubit QPU via the IQM Resonance cloud platform, using the Qrisp quantum-software framework (v 0.8.2). Each encoding was evaluated on n = 50 stratified samples drawn from the Genomics of Drug Sensitivity in Cancer dataset (GDSC2, 242 036 drug-cell-line pairs), targeting the natural-log IC50 response variable. Variational weights were optimised offline with the gradient-free COBYLA algorithm before hardware submission. Every circuit was executed with 1024 shots; the regression signal is the zero-qubit Pauli expectation value [<]Z0[>]. Results show that the QAOA-inspired encoding achieves the best RMSE of 3.314 and is statistically superior (p < 0.05, Wilcoxon signed-rank test) to six of the remaining eleven encodings. Hardware-efficient entanglement structures--specifically alternating cost and mixer layers--provide a systematic advantage over purely rotational or diagonal encodings under realistic noise conditions. This work constitutes a reproducible baseline for noise-aware quantum machine learning on pharmaceutical data; all code, data, and raw QPU outputs are publicly released.

5

ADMET Property Prediction with Quantum-Inspired Preprocessing

Mansour, B.; Rafaelyan, G.

2026-07-05 bioinformatics 10.64898/2026.06.30.735582 medRxiv

Top 0.2%

1.1%

Show abstract

Accurate prediction of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties is a central challenge in early-stage drug discovery, where experimental determination remains costly and time-consuming. In this work, we propose a quantum-inspired preprocessing framework in which statistical dependencies among molecular descriptors are encoded into a parameterised many-body Hamiltonian, and the expectation values obtained by simulating its time evolution serve as additional inputs to a gradient-boosted ensemble model (CatBoost). Mutual information (MI) is used both to select the most informative descriptors and to set the coupling strengths of the Hamiltonian, so that the induced entanglement structure reflects empirically measured feature correlations; the evolution is realised with a short digitised-counterdiabatic schedule that generates a compact set of expectation-value features while keeping the circuit shallow. The resulting quantum-derived feature vectors are concatenated with the full MapLight descriptor set, concatenated ECFP, Avalon, and ErG fingerprints together with RDKit physicochemical properties, before training. We evaluate the pipeline on the AqSolDB aqueous solubility benchmark from the Therapeutics Data Commons (TDC) platform, achieving a mean absolute error (MAE) of 0.746 +/- 0.006 log(mol/L), which is within the reported error bars of the current top-performing model on the TDC leaderboard (MAE = 0.741 +/- 0.013). Ablation experiments show that the quantum-derived features match classical second-degree polynomial interaction features derived from the same MI-selected subset, while forming a far more compact representation (85 quantum features versus up to 4,950 polynomial terms, an approximately 58-fold reduction). SHapley Additive exPlanations (SHAP) analysis identifies the physicochemical drivers of solubility predictions, offering interpretable insight into model behaviour. These results demonstrate that MI-guided Hamiltonian feature extraction can reproduce the performance of strong classical interaction models on aqueous solubility while generating a compact, interpretable feature representation that is compatible with future quantum execution.

6

An Integrated Knowledge Graph and Network Medicine Pipeline for Drug Repurposing: Benchmarking Across Human Diseases and Application to Amyotrophic Lateral Sclerosis

Jiang, A.; Hu, J.; Abdulle, Y.; Pain, O.; Iacoangeli, A.

2026-07-08 bioinformatics 10.64898/2026.07.03.736387 medRxiv

Top 0.2%

1.1%

Show abstract

Drug repurposing offers a practical strategy to identify new therapeutic uses for approved drugs, potentially reducing the time and cost associated with conventional drug development. We present a novel three-stage drug repurposing pipeline that integrates knowledge graph-based gene prediction, network-based drug-disease association analysis, and systematic classification of candidate drugs by therapeutic class. The pipeline integrates DGLinker to predict novel disease-associated genes, SAveRUNNER to identify drug repurposing candidates, and ATC Category Enrichment Analysis (ATCEA) to prioritise candidates by pharmacological class. We benchmarked the pipeline across twelve diseases using DrugBank and MEDI2-HPS as validation resources. Utilising DGLinker-expanded disease-gene sets as input increased the number of predicted repurposed drugs, while overall discriminative performance remained stable across diseases (AUROC 0.71-0.77). Application of ATCEA consistently improved precision, F1-score, and specificity, while reducing recall, reflecting a conservative prioritisation strategy that contracts the candidate space while retaining pharmacologically coherent drug-disease candidates. We further applied the pipeline to amyotrophic lateral sclerosis (ALS), a neurodegenerative disease with limited therapeutic options, and performed a deeper literature-based validation of the results. Incorporation of DGLinker-predicted genes substantially increased the number of significant candidate drugs and uncovered enriched ATC categories not identified using known ALS genes alone, including antidepressants and antipsychotics. Moreover, several drugs with supporting evidence available in the literature were identified only when DGLinker-predicted genes were used. Overall, 77 candidate drugs were prioritised within significantly enriched ATC categories, several of which are supported by previously published studies. To provide exploratory real-world support for these findings, we further evaluated candidate drugs in a longitudinal electronic health record (EHR) dataset of 2361 patients with ALS from King's College Hospital. Although the number of evaluable drugs was limited due to sample size, the EHR analysis provided additional clinically relevant context for selected prioritised drugs and pharmacological classes. Our pipeline demonstrates potential to accelerate drug repurposing by integrating complementary computational approaches to each step of the process, providing an end-to-end framework that showed robust performance across benchmarking experiments and use cases.

7

BoltzProt-1: Towards Efficient De Novo Binder Design with Good Developability

Ucar, T.; Bates, J.; Fu, Y.; Shi, W.; Stark, H.; Nava, D.; Cavalleri, L.; Wohlwend, J.; Corso, G.; Passaro, S.

2026-06-27 bioinformatics 10.64898/2026.06.23.733997 medRxiv

Top 0.2%

1.1%

Show abstract

Designing binders against novel protein targets remains a central challenge in computational drug discovery. Here we introduce BoltzProt-1, a pipeline for generating protein binders, including nanobodies, with improved hit rates and favorable developability properties. At its core lie a refined iteration of BoltzGens generative model and a novel protein-protein interaction prediction model, BoltzPPI. Employing BoltzPPI instead of BoltzGens standard structure-prediction confidence metrics to rank nanobody (VHH) designs increases the confirmed-binder hit rate from 3.3% to 8.0% across 10 novel targets. Assessed on 10 additional targets used in prior literature, the BoltzProt-1 pipeline obtains nanobody screening hits for 7 of 10 targets, surpassing the 6 of 10 previously reported by Chai-2. Finally, evaluating the developability of BoltzProt-1-designed nanobodies in terms of stability, aggregation, purity, polyspecificity and hydrophobicity reveals that 58% of its confirmed binders pass every criterion, exceeding both BoltzGen (40%) and clinical-stage VHH controls (21%). O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=104 SRC="FIGDIR/small/733997v1_ufig1.gif" ALT="Figure 1"> View larger version (39K): org.highwire.dtl.DTLVardef@125fb31org.highwire.dtl.DTLVardef@8e7482org.highwire.dtl.DTLVardef@8318a1org.highwire.dtl.DTLVardef@c62ab5_HPS_FORMAT_FIGEXP M_FIG C_FIG

8

Brain folding as a Fourier series yields a developmental clock

Goldschmidt, E.

2026-07-09 developmental biology 10.64898/2026.07.07.737104 medRxiv

Top 0.2%

1.0%

Show abstract

The human cerebral cortex folds into a stereotyped shape during gestation. Different principles govern the large and small scales of the final brain geometry. Here, I show that the fetal cerebrum can be described as a band limited spherical harmonic Fourier object which entire gyrification process collapses to a single one-dimensional curve, in which the maximum harmonic degree acts as a developmental coordinate. The closed form descriptor predicts gestational age with mean absolute error 0.13 and 0.38 weeks across fetal brain atlases, exceeding the published learning-based state of the art by a factor of three to seven. The same descriptor, applied to single subjects in the FeTA pathological dataset, can classify the per subject distance from the normative trajectory and discriminate pathological from neurotypical fetuses. The result is a single closed form, zero-training-cost descriptor that simultaneously dates the fetal brain and detects atypical development.

9

Identifying and Addressing Systematic Data Leakage in Protein-Ligand Affinity Benchmarks

Mattsson, B.;Walters, W.

2026-06-30 Molecular Biology 10.64898/2026.06.29.735309 medRxiv

Top 0.2%

1.0%

Show abstract

Accurate prediction of protein-ligand binding affinity is a crucial goal in structure-based drug discovery, with the potential to significantly shorten development timelines. Recently, a new wave of machine learning models based on co-folding, such as Boltz-2 and IsoDDE, has demonstrated performance that matches or exceeds that of gold-standard physics-based methods like Free Energy Perturbation (FEP). This paper provides a critical assessment of these claims, revealing that current benchmarks are heavily influenced by data leakage, and proposes a new benchmark that explicitly controls for data leakage. We demonstrate that splitting by protein-sequence identity is inherently insufficient to prevent data leakage due to "target mirroring," in which homologous proteins with low overall sequence identity still exhibit highly correlated binding profiles. Our meta-analysis of documents in the ChEMBL 36 database identifies more than 6,000 such assay pairs and finds that leakage persists for sequence-identity thresholds as low as 0.2, well below the values commonly used in benchmarks today. Additionally, we show that a ligand-only baseline model, which lacks protein structural information, achieves surprisingly high performance on the FEP+ 4 and OpenFE benchmarks (r = 0.66 and r = 0.36, respectively). Our results indicate that current benchmarks tend to reward models for memorizing training data and exploiting localized leakage rather than truly learning biophysical principles. To address this issue, we propose the Novelty-Tiered Affinity Benchmark, in which the test data is partitioned into ligand novelty tiers. In the most challenging tier (Tanimoto similarity < 0.35), ligand-only models perform notably worse (r = 0.14), offering a clear baseline for evaluating genuine generalization. We argue that the field must move beyond sequence-based splits to ensure that AI-driven discovery translates into successful prospective laboratory research.

10

A foundation model enables prediction of natural product molecular properties, bioactivity, and structural similarity from biosynthetic gene cluster sequence

Walker, A.

2026-07-07 bioinformatics 10.64898/2026.07.05.736569 medRxiv

Top 0.2%

1.0%

Show abstract

Genome mining is a powerful technique in natural product discovery, where biosynthetic gene clusters that are likely to produce novel or desirable natural products are identified through bioinformatic analysis. There are many more predicted biosynthetic gene clusters than can easily be experimentally characterized. Additional computational methods to prioritize biosynthetic gene clusters by the bioactivity, structural properties, or novelty of the product would make genome mining more efficient. Multiple machine learning/artificial intelligence models have been developed to predict product properties from biosynthetic gene cluster sequence, but they are limited by small quantities of training data. Model pretraining with unlabeled data is a powerful technique to develop models that can learn on a limited amount of labeled training data. Biosynthetic gene clusters are well suited to this strategy because there are many predicted clusters with only a small percentage being characterized. This paper reports BGC-MLM, a foundation model that is pretrained with a masked language task on predicted biosynthetic gene clusters and then fine-tuned for downstream applications including prediction of product structural class, bioactivity, chemical properties, counts of functional groups, and chemical fingerprint. Comparison to a model trained without pretraining shows that pretraining generally improves performance. BGC-MLM shows better or similar performance to existing specialized methods for these tasks, demonstrating its utility as a foundation model for natural product genome mining.

11

Improving Generalizability in Whole-Cell Antibiotic Discovery Through Active Learning

Serrano, L. R.; Zhou, A.; Wei, Z.; Stocks, K.-L. K.; Ektefaie, Y.; Gwynne, P. J.; Chen, E.; Krieger, I.; Sacchettini, J.; Aldridge, B.; Hu, L. T.; Farhat, M. R.

2026-07-05 bioinformatics 10.64898/2026.07.04.736489 medRxiv

Top 0.2%

1.0%

Show abstract

Machine learning (ML) has accelerated molecular discovery, yet training models to generalize to out-of-distribution (OOD) chemical spaces remains fundamentally constrained by the high cost of experimental validation. In antibiotic discovery, where whole-cell phenotypic high throughput screening (HTS) is resource-intensive, iterative ML-guided compound selection, or Active Learning (AL), offers a pathway to efficiently navigate available chemical spaces. However, the algorithmic tradeoffs between prioritizing compound novelty (exploration), predicted bioactivity (exploitation), and their impact on OOD generalizability remain unresolved for noisy, whole-cell biological systems. In this work, we systematically evaluate three AL strategies for whole-cell bacterial bioactivity and benchmark their effects on model accuracy, hit rate, and OOD performance. Using retrospective simulations on Mycobacterium tuberculosis HTS data, we identify an optimal AL strategy that balances predicted hit/non-hit novelty with overall hit rate. We then integrate the strategy in a closed-loop Borrelia burgdorferi antibiotic discovery HTS campaign. The AL-guided approach successfully increased the experimental screening hit rate five-fold (from a 0.2% rate within investigator-selected plates to 1.0%). Further, when the trained model was applied in prospective in silico selection of highly diverse compounds across multiple bacterial species, the AL-trained whole-cell inhibition predictor demonstrates 53-fold enrichment over investigator-directed screening (11.0% experimental validation of predicted hits). Of these, 100% demonstrated the intended narrow spectrum activity for Borrelia burgdorferi. These results demonstrate that calibrated AL strategies can overcome data acquisition bottlenecks and train generalizable property predictors able to extrapolate to OOD molecules.

12

Pharmacological Stratification of Public Bioactivity Databases: A Reusable, OECD-Anchored Curation and Benchmarking Framework Demonstrated for Opioid Receptors

Nael, M.; Alakonda, L.; Ghosh, A.; Ward, S. J.; Liu-Chen, L.-Y.; Rajadhyaksha, A. M.; Abou-Gharbia, M.; Elokely, K. M.

2026-06-24 bioinformatics 10.64898/2026.06.18.732083 medRxiv

Top 0.2%

0.9%

Show abstract

Public bioactivity databases are heterogeneous not only in measurement type, where binding affinities and functional potencies are reported on different scales, but in pharmacology: the same compound and target can carry agonist, antagonist, or inhibitor records measured through binding displacement, cAMP, {beta}-arrestin, or [35S]GTP{gamma}S readouts that quantify different biological events. Pooling these records produces models whose output is detached from any coherent pharmacological claim. Prior work has standardized bioactivity at scale and quantified the noise from mixing measurement types, but pharmacological mechanism and assay-readout class have not been treated as a primary axis of large-scale curation. This study presents an auditable, OECD-anchored framework that stratifies public records by action type and assay readout before modeling, converting heterogeneous data into externally validated, interpretable QSAR tasks that compose with existing standardization resources rather than replacing them. The framework is demonstrated on the four opioid receptors (MOR, DOR, KOR, and nociceptin/orphanin FQ, NOP). Four public sources were reconciled into 72,148 merged records and 50,977 curated measurements spanning 19,585 compounds, each carrying auditable attributes for source agreement, endpoint meaning, pharmacology class, assay readout, and trust tier. Receptor-level binding tasks formed a compact benchmark with strong locked external performance, including KOR pK (R2 = 0.79, n = 798) and DOR pK (R2 = 0.77, n = 736). Pharmacology- and readout-resolved functional endpoints yielded externally validated strata that pooled labels would obscure, including a MOR antagonist functional-inhibition endpoint (R2 = 0.86, n = 110) and agonist potency endpoints for DOR, KOR, and MOR (R2 up to 0.81). Comparison against a fully pooled baseline shows that pooled models either match stratified models on coherent endpoints or reach a deceptively high R2 on functional-IC50 endpoints by training predominantly on binding-displacement records, so the pooled number predicts affinity rather than functional activity. SHAP attribution indicates that binding and functional potency encode partially distinct structure-activity signals. The dataset contract, not model performance alone, defines the validity and scope of a QSAR claim, and stratification is a precondition for a functional model to support a defensible claim. Curation logic, derived tables, frozen data, and reproducibility artifacts are released.

13

Real Science Is Harder Than Benchmarks: Evaluating Advanced AI Frameworks on Published Studies. I. Uncertainty Quantification, ML on Therapeutic Data Commons, and Agent-Based Modeling

Ahmed, M. O.; Amale, S. A.; Bhavsar, R. D.; Chopra, P.; Jaimes, A.; Kachhwah, A.; Kalotra, C. D.; Li, P.; Li, X.; Liao, Y.; Roy, R.; Senthilselvan, N.; Shao, Y.; Sharma, A. D.; Shrivatsan, A.; Xue, R.; You, Y.; Badkul, A.; Xie, L.; Oet, M.; Lee, K.; Sinitskiy, A.

2026-06-27 bioinformatics 10.64898/2026.06.24.734302 medRxiv

Top 0.3%

0.8%

Show abstract

Artificial Intelligence (AI) frameworks for automating scientific research have shown strong performance on benchmarks, but their capacity to routinely reproduce results from multiple real-life published studies remains largely untested. We evaluated five advanced AI research frameworks (Kosmos, K-Dense, ToolUniverse, BioAgents from bio.xyz, and the AI Scientist-v2 from Sakana AI) on three real-life tasks (including two recently published papers) spanning uncertainty quantification for molecular property predictions, machine learning on Therapeutic Data Commons benchmarks, and agent-based modeling. AI frameworks demonstrated genuine strengths: generating original hypotheses, competently executing routine data acquisition and coding tasks, providing statistical measures of confidence often absent from the original papers, and producing well-formatted final reports. At the same time, our experiments revealed that real-world scientific tasks remain considerably harder than current benchmarks suggest. No AI framework matched the scope or depth of the original studies, results varied across multiple runs of the same framework with the same prompt, and we documented cases of severe hallucinations in final reports, gaps in literature coverage, and overconfident conclusions. Verification of AI outputs required substantial domain expertise. While these three tasks are only partially representative of the broader scientific landscape, they offer a starting point for developing a more rigorous methodology for evaluation of AI performance than what is currently practiced. We conclude that AI frameworks are already valuable for prototyping research directions and stress-testing completed studies, and some of the limitations documented here appear largely tractable through infrastructure improvements and continued development.

14

BBBP_Atlas: Unified Interpretable Modeling of Blood Brain Barrier Permeability across Small Molecules and Peptides

Shen, X.; Su, Q.; Luo, H.; Gou, Q.; Ge, J.; Hou, T.; Wang, J.; Kang, Y.

2026-07-09 bioinformatics 10.64898/2026.07.06.736742 medRxiv

Top 0.3%

0.6%

Show abstract

Accurate prediction of blood-brain barrier permeability (BBBP) is essential for central nervous system drug discovery, yet existing models are often limited by their reliance on predefined physicochemical descriptors, small-molecule-centered training sets, or conformation-dependent representations, which restricts their transferability across chemically diverse modalities especially peptides. In addition, publicly available BBBP datasets remain fragmented, inconsistently standardized, and weakly controlled for molecular redundancy, increasing the risk of data leakage and overestimated model performance. In this study, we propose BBBP-Atlas, a structure-aware BBB permeability prediction model designed for unified modeling of small molecules and peptides with the first cross-modal dataset OmniBBBP. Designed to bypass descriptor and conformation dependencies, our model represents standardized molecular structures as atom-level graphs to capture local atom-bond environments and long-range topological dependencies associated with BBB transport. This design enables direct learning of structure-permeability relationships from molecular topology. For model training and evaluation, we curated a cross-modal, redundancy-filtered database OmniBBBP that seamlessly unifies small molecules and complex peptides, containing 10,218 unique compounds with 9,316 small molecules and 902 peptides. BBBP-Atlas achieved an accuracy of 0.8914 and an MCC of 0.7678 on the independent test set. On a balanced external benchmark of 200 compounds, our model reached an AUC of 0.9108, an accuracy of 0.8500, and an MCC of 0.7000, outperforming LightBBB by an absolute MCC gain of 6%. Case studies further showed that BBBP-Atlas captured clinically meaningful BBB permeability patterns, correctly identifying lorlatinib as BBB-permeable and vancomycin as BBB-impermeable with high confidence. The OmniBBBP-backed BBBP-Atlas offers a versatile and cross-modal approach for single-compound prediction, batch screening, and dataset exploration for CNS drug discovery. BBBP-Atlas is available at https://cadd.drugflow.com/bbbp/.

15

Retention, not flux: endpoint confounding caps computational prediction of peptide skin penetration, with a delivery-aware reframing

Komianos, N.; Prakash, P.

2026-06-29 bioinformatics 10.64898/2026.06.25.734657 medRxiv

Top 0.3%

0.6%

Show abstract

Bioactive peptides are now central to cosmetic and dermatological actives, yet predicting whether a given sequence will reach its site of action in skin remains unsolved. We contend that the dominant framing, predicting a single binary "skin permeability" label from sequence, is ill-posed, and that this, rather than a shortage of modelling power, explains the field's stalled predictive performance. The scope of the claim is narrow: barrier-crossing propensity is a legitimate, learnable function of molecular structure, whereas the vehicle- and endpoint-agnostic binary label that the literature supplies is not. We support this with a first-principles analysis and a study of public-source data. First, the experimental endpoint most commonly reported, transdermal flux into a diffusion-cell receptor compartment (OECD Test Guideline 428), conflates two opposite outcomes (genuine deep delivery and undesired systemic transport) and is, for a cosmetic active, frequently a failure signal rather than a success signal. That receptor flux is an imperfect measure of cutaneous bioavailability is long established in dermatopharmacokinetics; our contribution is to show that the same confound, inherited through scraped labels, is what caps machine learning from sequence. Second, reported "permeability" is a property of the sequence x delivery-vehicle x measurement-compartment triad, two terms of which are usually unrecorded. Third, on public-source data, a physicochemical intrinsic-permeability estimate (Potts-Guy) carries no positive predictive signal for scraped penetration labels (grouped AUC 0.45, 95% CI 0.40-0.51); sequence-only classifiers plateau in the mid-0.70s with diminishing returns as labels accumulate (AUC 0.70-0.77); and the same descriptor pipeline on a clean single-endpoint membrane dataset scores materially higher (AUC 0.83, non-overlapping CI). Our proposed reframing separates barrier-crossing (data-driven, sequence-level) from depth-and-retention (physics-driven, delivery-aware) and treats intrinsic transdermal flux as a regulatory risk axis; we close by proposing a triad-annotated reporting schema and a seed benchmark.

16

Assessing AI and Neurologist Diagnostic Reasoning Against Neuropathological Ground Truth

Leng, Y.; Noori, A.; Dickson, J. R.; Serrano-Pozo, A.; Avetisyan, M.; Rodriguez, D.; Rosenberg, E. S.; He, Y.; Oakley, D. H.; Khurana, V. S.; Hyman, B. T.; Frosch, M. P.; Das, S.

2026-07-10 neurology 10.64898/2026.07.07.26356930 medRxiv

Top 0.3%

0.6%

Show abstract

BACKGROUND Accurate differential diagnosis of complex neurological disorders remains challenging due to overlapping clinical features and heterogeneous disease presentations. Although large language models (LLMs) show promise in clinical reasoning, prior studies benchmark performance against clinician consensus rather than biological ground truth. A neuropathologically confirmed benchmark dataset for evaluating diagnostic AI in neurology is currently lacking. METHODS We introduce NeuroBench, a curated benchmark of complex neurological cases with neuropathologically confirmed gold-standard diagnoses, and DIAGNO, a confidence-aware LLM-based system for neurological diagnosis. NeuroBench comprises 203 retrospective case summaries from the Massachusetts General Hospital Brain Cutting Conference with corresponding autopsy-confirmed diagnoses. DIAGNO generated top-3 differential diagnoses, employing retrieval-augmented generation (RAG) for lower-confidence cases. Performance was assessed by three independent blinded adjudicators who evaluated both DIAGNO and neurologists against neuropathological ground truth. RESULTS NeuroBench encompassed 79 unique neuropathological diagnoses, spanning conditions including cerebrovascular disease, brain tumors, neurological infections, and various neurodegenerative and inflammatory disorders. DIAGNO matched or outperformed neurologists in top-3 accuracy (0.67 versus 0.63) and taxonomy-level accuracy (0.74 versus 0.66). In cases of disagreement, DIAGNO was more often correct than neurologists (29 versus 19 cases). Diagnostic concordance between DIAGNO and neurologists was high (90% agreement in top-3 predictions), even when both were incorrect, suggesting strong alignment in diagnostic reasoning. On NeuroBench, DIAGNO also outperformed GPT-4o baseline and DeepSeek R1 across all top-k accuracy metrics. In a real-world evaluation on eight complex cases with differentials from Mass General Brigham, neurologists rated DIAGNO's reasoning favorably (mean 4.03/5) across multiple dimensions of clinical utility and safety. CONCLUSIONS NeuroBench establishes neuropathological confirmation as the appropriate standard for evaluating diagnostic AI in neurology, moving beyond clinician-referenced benchmarking to define the ceiling of diagnostic accuracy. Evaluated against this standard, DIAGNO achieved expert-level diagnostic performance and received favorable clinician ratings in real-world applications, supporting its potential as a clinical decision-support tool in neurology.

17

RD-OMICS: An Integrative Multi-Omics Data Inventory in Rare Diseases

Sun, S.; Wang, H.; Mathe, E. A.; Zhu, Q.

2026-07-03 bioinformatics 10.64898/2026.06.29.735296 medRxiv

Top 0.4%

0.6%

Show abstract

Rare diseases (RD) impact over 30 million individuals in the United States, yet fewer than 5% of the identified conditions have FDA-approved treatments. Progress in RD research is hindered by small patient cohorts, biological heterogeneity, and the fragmented, inconsistently annotated publicly available omics data, which limits integrative analysis and translational discovery. Here, we present RD-OMICS, a data inventory with integrated and structured RD omics data from Gene Expression Omnibus (GEO), in the form of a knowledge graph. We developed a metadata harmonization pipeline that combines rule-based mapping and large language model (LLM)-assisted semantic categorization. The graph-based data model was defined to integrate different types of data including disease conditions, experiments, samples, platforms, projects, and publications into a centralized inventory graph. In this preliminary study, 11,049 GEO series for 126 rare diseases were processed and integrated into RD-OMICS, which includes 375,930 individual biospecimen samples, 1,578 sequencing and array platforms, 10,938 biological projects. Case studies demonstrate the use of RD-OMICS in supporting rare disease research, omics cohort construction, and transcriptome-based drug repurposing for amyotrophic lateral sclerosis (ALS). RD-OMICS provides a scalable foundation for transforming fragmented omics data into a structured, harmonized and interoperable resource, facilitating therapeutic development and other translational discoveries in rare diseases.

18

Benchmarking AI-Driven PTIm-mAb Across Eleven FDA-Approved Bispecific Antibodies: A Cross-Tool Validation Study

Addepalli, M. K.; Prattipati, M.

2026-07-10 bioinformatics 10.64898/2026.07.07.736933 medRxiv

Top 0.4%

0.6%

Show abstract

BackgroundLate-stage attrition in therapeutic antibody discovery is dominated by developability liabilities: aggregation, polyspecificity, charge-driven non-specific binding, and chain-mispairing artefacts. Bispecific antibodies amplify these risks because each additional binding arm adds a new biophysical envelope that must be jointly satisfied. The existing in-silico ecosystem addresses individual axes of this problem (humanization, structure prediction, single-metric developability scoring) but few platforms integrate them end-to-end. PTIm-mAb (SANSHI Bio Solutions Pvt Ltd) is a multi-objective, AI/ML-driven antibody design platform that jointly optimizes sequence liabilities, surface aggregation, charge balance, humanness, and predicted binding affinity, and recommends a bispecific architecture in a single workflow. MethodsWe applied PTIm-mAb to the published sequences of eleven FDA-approved bispecific antibodies using the platforms default-parameter Pareto-acceptance optimization loop, run to convergence or to the internal iteration ceiling, with no human curation between the platform run and the external profiler. Both wild-type and platform-optimized sequences were profiled independently with three publicly available developability tools: Aggrescan, CamSol, and the Therapeutic Antibody Profiler (TAP). Paired-sample tests (Wilcoxon signed-rank, exact binomial sign test, McNemar exact test) evaluated the direction and significance of changes. ResultsAcross the 17 evaluable paired arms profiled by TAP, PTIm-mAb cleared four wild-type CDR-vicinity Positive Charge Patch (PPC) flags Blinatumomab-Arm1 (1.9952 [->] 0.6885), Mosunetuzumab-Arm1 (1.3391 [->] 0.0568), Linvoseltamab-Arm2 (0.8060 [->] 0.0), and the headline Elranatamab-Arm1 case (1.7981 [->] 0.5799) achieved without trading off any other in-range metric and corroborated by Aggrescan and CamSol on the same arm. Total CDR length was significantly shortened across the cohort (Wilcoxon two-sided p = 0.0075, one-sided p = 0.0037, effect size r = 0.65): significant improvement on the metric most directly under the optimizers control. The directional shift on Aggrescan integrated aggregation propensity was also significant by sign test (24 of 36 chains improved, 2 unchanged, 10 worsened; p = 0.021). On the already-clean Zenocutuzumab profile the optimizer identified residual headroom (PPC 0.1191 [->] 0.0; SFvCSP 12.5 [->] 6.0), demonstrating that the platforms value extends to candidates that pass all flags. Three results: Teclistamab Arm-1, Emicizumab, and Talquetamab Arm-2 did not clear all flags and are presented as candidates for iterative re-invocation of the platform pipeline on the optimized output (planned follow-up; Section 5). The remaining TAP metrics (PSH, PPC magnitude, PNC, |SFvCSP|) trended in the improvement direction without reaching significance in this cohort, a pattern consistent with the expected statistical signature of a multi-objective optimizer applied to molecules already within the clinical-stage envelope. The platform reported a mean of 12.8 months and USD 723,889 of computational front-loading per project across the nine-project cohort (range 9.0-16.0 months; USD 510,000-960,000); the underlying cost assumptions are tabulated in Supplementary Table S3. ConclusionPTIm-mAb produces externally verifiable, literature-aligned improvements on the metrics most directly under its control, clears CDR-vicinity charge-patch flags on a meaningful fraction of flagged candidates, and front-loads substantial design-iteration work. The cohort-level pattern is consistent with a calibrated multi-objective optimizer operating at the edge of detectable headroom on a deliberately hard benchmark. We position the platform as an early-stage triage and lead-optimization layer in bispecific antibody discovery. For molecules whose first-pass result does not clear all flags, iterative re-invocation of the pipeline on the optimized output is a natural follow-up direction.

19

Dynamic Graph Representation Learning for Data-Driven Huntington's Disease Staging: Evaluation Against Existing Embedding Methods and State-Space Models

Abu Zohair, L. M.; Zantout, H.; Gow, A. J.; Woodward, J.; Lones, M.; Vallejo, M.

2026-06-30 health informatics 10.64898/2026.06.27.26355575 medRxiv

Top 0.4%

0.6%

Show abstract

Huntington's disease (HD) presents a heterogeneous neurodegenerative course, with motor, cognitive, and functional symptoms progressing differently across individuals. This atypical progression complicates the definition of discrete disease stages, hindering understanding of disease trajectories, timely pa- tient care, and therapy development. Consequently, current clinical staging systems rely heavily on clinician-defined, domain-specific criteria and fixed clinical measurement boundaries for stage assignment, reducing objectivity and often leading to overlapping clinical measurements across stages. While machine learning methods can help, existing approaches cannot fully capture complex temporal relationships within and across patients. We propose URL- STFN, a dynamic graph-based representation learning model that encodes both inter- and intra-patient temporal patterns from longitudinal clinical measures. We then evaluate disease stages formed through clustering and stability analysis of URL-STFN latent representations, and compare them with representations obtained from conventional embedding approaches. We further benchmark these clustering-based stages against states derived from conventional temporal models, including DHMM. We hypothesize that clustering URL-STFN latent representations enables identification of HD stages with reduced overlap in clinical measurements. The proposed framework is evaluated using 1,477 clinical visits from the Enroll-HD dataset, a large lon- gitudinal cohort with repeated clinical assessments. For staging, we used 44 clinical measurements spanning motor, cognitive, and functional domains. URL-STFN identifies clinically meaningful HD stages consistent with estab- lished disease progression while reducing overlap in clinical feature values compared with DHMM-derived and clinical staging approaches. These find- ings highlight the potential of a dynamic graph-based representation learning and clustering framework to support more objective, data-driven, and precise HD staging.

20

BGC-QDR: A Quantum-Assisted Pipeline for Biosynthetic Gene Cluster Discovery and Ranking from Environmental DNA

Mishra, A.; Rai, A.

2026-06-25 bioinformatics 10.64898/2026.06.21.733574 medRxiv

Top 0.4%

0.6%

Show abstract

Biosynthetic gene clusters (BGCs) encode enzymatic pathways for natural products with pharmaceutical potential, yet prioritizing candidates from fragmented environmental DNA (eDNA) assemblies remains computationally challenging. We present BGC-QDR (Biosynthetic Gene Cluster Quantum Discovery and Ranking), an open-source pipeline that integrates input quality control, Prodigal ORF prediction, Pfam HMM domain annotation, rule-based BGC classification, MiBIG 4.0 novelty assessment, and variational quantum classifier (VQC) ranking via PennyLane. BGC-QDR is designed as a quantum-assisted ranking framework for biologically informed BGC prioritization, not as a claim of quantum computational advantage over classical machine learning. We evaluate the pipeline on MiBIG 4.0 (2,636 annotated BGCs) using a 20-dimensional biosynthetic feature vector and stratified 10-fold cross-validation. The integrated VQC (6 qubits x 3 layers, 54 parameters) achieves accuracy of 0.789 {+/-} 0.076 and ROC-AUC of 0.835 {+/-} 0.057. Random Forest achieves the highest ROC-AUC (0.898 {+/-} 0.032), followed by Logistic Regression (0.874 {+/-} 0.020) and MLP (0.872 {+/-} 0.024). Wilcoxon signed-rank tests on per-fold AUC scores show that VQC ROC-AUC is significantly lower than Random Forest (p = 0.0098) and Logistic Regression (p = 0.037) at = 0.05, with no significant difference versus MLP (p = 0.064). Architecture ablation identifies 4 qubits x 3 layers as the best VQC configuration on hold-out validation (AUC = 0.737). Feature importance analysis highlights peptidyl carrier protein domains, cluster length, and module count as dominant predictors. BGC-QDR provides a reproducible, end-to-end workflow for eDNA-derived BGC discovery with integrated novelty scoring and quantum-assisted candidate ranking. The complete BGC-QDR source code, benchmark datasets, and reproduction instructions are publicly available at: Abhishekmishra2808/BGC-PIPELINE